Distributed representation-based spoken word sense induction

نویسندگان

  • Justin Chiu
  • Yajie Miao
  • Alan W. Black
  • Alexander I. Rudnicky
چکیده

Spoken Term Detection (STD) or Keyword Search (KWS) techniques can locate keyword instances but do not differentiate between meanings. Spoken Word Sense Induction (SWSI) differentiates target instances by clustering according to context, providing a more useful result. In this paper we present a fully unsupervised SWSI approach based on distributed representations of spoken utterances. We compare this approach to several others, including the state-of-the-art Hierarchical Dirichlet Process (HDP). To determine how ASR performance affects SWSI, we used three different levels of Word Error Rate (WER), 40%, 20% and 0%; 40% WER is representative of online video, 0% of text. We show that the distributed representation approach outperforms all other approaches, regardless of the WER. Although LDA-based approaches do well on clean data, they degrade significantly with WER. Paradoxically, lower WER does not guarantee better SWSI performance, due to the influence of common locutions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Noun Sense Induction and Disambiguation using Graph-Based Distributional Semantics

We introduce an approach to word sense induction and disambiguation. The method is unsupervised and knowledge-free: sense representations are learned from distributional evidence and subsequently used to disambiguate word instances in context. These sense representations are obtained by clustering dependency-based secondorder similarity networks. We then add features for disambiguation from het...

متن کامل

Graph Based Algorithms for Word Sense Induction and Disambiguation

This paper presents a survey of graph based methods for word sense induction and disambiguation. Many areas of Natural Language Processing like Word Sense Disambiguation (WSD), text summarization, keyword extraction make use of Graph based methods. The very idea behind graph based approach is to formulate the problems in graph setting and apply clustering to obtain a set of clusters (senses). T...

متن کامل

Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...

متن کامل

Sense-aware Semantic Analysis: A Multi-prototype Word Representation Model using Wikipedia

Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on W...

متن کامل

Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia

Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on W...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015